Consistent Checkpoints of PVM Applications
نویسنده
چکیده
Currently PVM is the standard for developing parallel applications in workstation environments. One of its goals is to use the computational power of idling workstations. In practice many users refrain from opening their machine to other users’ PVM processes. This is due to their experience that such a process, which usually requires a lot of resources (CPU and memory), increases the response time for short interactive commands, i.e. reduces his own productivity. In the following, we describe an approach which enhances existing queuing systems with a support for parallel PVM applications and allows the migration of PVM processes to other machines. Hence, users who start to work interactively on their machine are no longer bothered by resource consuming PVM processes.
منابع مشابه
Resource Management and Checkpointing for PVM
Checkpoints cannot only be used to increase fault tolerance, but also to migrate processes. The migration is particularly useful in workstation environments where machines become dynamically available and unavailable. We introduce the CoCheck environment which not only allows the creation of checkpoints, but also provides process migration. The creation of checkpoints of PVM applications is exp...
متن کاملProving Properties of PVM Applications - A Case Study with CoCheck
The results of a case study where we applied a formal method to prove properties of CoCheck, an extention of PVM for the creation of checkpoints of parallel applications on workstation clusters. Although the functionality of CoCheck had been demonstrated in experiments, there was no proof of the desired properties. Consequently, a formal method had to be applied which allows to prove those prop...
متن کاملState Based Visualization of PVM Applications
Understanding the dynamic behavior of parallel programs is a critical issue both for debugging and for optimization. A visualization tool displaying an animated sequence of the global states the program runs through offers valuable support for this process. The paper presents the features and the implementation of VISTOP, a state based visualizer for PVM applications. It supports program flow v...
متن کاملFail-safe PVM: A portable package for distributed programming with transparent recovery
Many scientific problems benefit from computations that are parallel at a coarse grain. Collections of looselycoupled, heterogeneous computers are increasingly being applied to these problems. While individual computers are designed to be relatively reliable, a collection of several autonomous machines necessarily has a greater rate of failure. As data networks improve, and larger multicomputer...
متن کاملExtended mpiJava for Distributed Checkpointing and Recovery
In this paper we describe an mpiJava extension that implements a parallel checkpointing/recovery service. This checkpointing/recovery facility is transparent to applications, i.e. no instrumentation is needed. We use a distributed approach for taking the checkpoints, which means that the processes take their local checkpoints independently. This approach reduces communication between processes ...
متن کامل